Comparative Study of Distributed Resource Management Systems – SGE, LSF, PBS Pro, and LoadLeveler

نویسندگان

  • Yonghong Yan
  • Barbara Chapman
چکیده

Distributed Resource Management Systems (D-RMS) control the usage of hard resources, such as CPU cycles, memory, disk space and network bandwidth, in high-performance parallel computing systems. Users request resources by submitting jobs, which could be sequential or parallel. The goal of a D-RMS is to achieve the best utilization of resources and to maximize system throughput by orchestrating the process of assigning the hard resources to users’ jobs. In the past decade, lots of work have been done to survey and study those systems, but most of them are from the viewpoint of users and D-RMS provided functionalities. In this study, we comparatively study current widely-deployed D-RMS from the system aspect by decomposing D-RMS into three subsystems: Job management subsystem, physical resource management subsystem, and scheduling and queuing subsystem. Also the system architecture to organize these three subsystems are discussed in detail. This work contributes to the D-RMS vendor and research in distributed resource management by presenting D-RMS internals for the designer in their future system upgrade and improvement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A performance study of job management systems

Job Management Systems (JMSs) efficiently schedule and monitor jobs in parallel and distributed computing environments. Therefore, they are critical for improving the utilization of expensive resources in high-performance computing systems and centers, and an important component of grid software infrastructure. With many JMSs available commercially and in the public domain, it is difficult to c...

متن کامل

A Comparison of Job Management Systems in Supporting HPC ClusterTools

This paper compares three most common job management systems and their workings with Sun HPC ClusterTools 3.1. Various aspects such as installation, customization, scheduling and resource control issues are discussed. The three chosen systems are: Load Sharing Facility (LSF), Portable Batch System (PBS) and COmputing in DIstributed Networked Environment (CODINE)/ Global Resource Director (GRD)....

متن کامل

Effective Utilization and Reconfiguration of Distributed Hardware Resources Using Job Management Systems

Reconfigurable hardware resources are very expensive, and yet can be underutilized. This paper describes a middleware capable of discovering underutilized computing nodes with FPGA-based accelerator boards in a networked environment. Using an extended Job management system (JMS), this middleware permits sharing reconfigurable resources at least among the members of the same organization. Tradit...

متن کامل

Performance Evaluation of Selected Job Management Systems

One important component of grid software infrastructure and parallel systems management is the Job Management System (JMS). With many JMSs available commercially and in public domain, it is difficult to choose the most efficient JMS for a given computing environment. All previous comparisons of JMSs had only a conceptual character. In this paper, we present the results of the first empirical st...

متن کامل

Effective Use of Networked Reconfigurable Resources

Distributed reconfigurable resources, such as FPGA-based accelerator boards1 are expensive and often underutilized. Therefore, it is important to permit sharing these resources at least among the members of the same organization. Traditional resources, such as CPU time of loosely coupled workstations can be shared using a variety of existing distributed computing systems. We analyzed twelve of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004